Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 95
Filtrar
1.
Patterns (N Y) ; 5(3): 100944, 2024 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-38487797

RESUMO

The underrepresentation of gender, racial, and ethnic minorities in clinical trials is a problem undermining the efficacy of treatments on minorities and preventing precise estimates of the effects within these subgroups. We propose FRAMM, a deep reinforcement learning framework for fair trial site selection to help address this problem. We focus on two real-world challenges: the data modalities used to guide selection are often incomplete for many potential trial sites, and the site selection needs to simultaneously optimize for both enrollment and diversity. To address the missing data challenge, FRAMM has a modality encoder with a masked cross-attention mechanism for bypassing missing data. To make efficient trade-offs, FRAMM uses deep reinforcement learning with a reward function designed to simultaneously optimize for both enrollment and fairness. We evaluate FRAMM using real-world historical clinical trials and show that it outperforms the leading baseline in enrollment-only settings while also greatly improving diversity.

2.
Patterns (N Y) ; 5(3): 100945, 2024 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-38487808

RESUMO

While machine learning (ML) research has recently grown more in popularity, its application in the omics domain is constrained by access to sufficiently large, high-quality datasets needed to train ML models. Federated learning (FL) represents an opportunity to enable collaborative curation of such datasets among participating institutions. We compare the simulated performance of several models trained using FL against classically trained ML models on the task of multi-omics Parkinson's disease prediction. We find that FL model performance tracks centrally trained ML models, where the most performant FL model achieves an AUC-PR of 0.876 ± 0.009, 0.014 ± 0.003 less than its centrally trained variation. We also determine that the dispersion of samples within a federation plays a meaningful role in model performance. Our study implements several open-source FL frameworks and aims to highlight some of the challenges and opportunities when applying these collaborative methods in multi-omics studies.

3.
NPJ Digit Med ; 7(1): 16, 2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38253711

RESUMO

In the U.S. inpatient payment system, the Diagnosis-Related Group (DRG) is pivotal, but its assignment process is inefficient. The study introduces DRG-LLaMA, an advanced large language model (LLM) fine-tuned on clinical notes to enhance DRGs assignment. Utilizing LLaMA as the foundational model and optimizing it through Low-Rank Adaptation (LoRA) on 236,192 MIMIC-IV discharge summaries, our DRG-LLaMA -7B model exhibited a noteworthy macro-averaged F1 score of 0.327, a top-1 prediction accuracy of 52.0%, and a macro-averaged Area Under the Curve (AUC) of 0.986, with a maximum input token length of 512. This model surpassed the performance of prior leading models in DRG prediction, showing a relative improvement of 40.3% and 35.7% in macro-averaged F1 score compared to ClinicalBERT and CAML, respectively. Applied to base DRG and complication or comorbidity (CC)/major complication or comorbidity (MCC) prediction, DRG-LLaMA achieved a top-1 prediction accuracy of 67.8% and 67.5%, respectively. Additionally, our findings indicate that DRG-LLaMA 's performance correlates with increased model parameters and input context lengths.

4.
bioRxiv ; 2024 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-37986893

RESUMO

While machine learning (ML) research has recently grown more in popularity, its application in the omics domain is constrained by access to sufficiently large, high-quality datasets needed to train ML models. Federated Learning (FL) represents an opportunity to enable collaborative curation of such datasets among participating institutions. We compare the simulated performance of several models trained using FL against classically trained ML models on the task of multi-omics Parkinson's Disease prediction. We find that FL model performance tracks centrally trained ML models, where the most performant FL model achieves an AUC-PR of 0.876 ± 0.009, 0.014 ± 0.003 less than its centrally trained variation. We also determine that the dispersion of samples within a federation plays a meaningful role in model performance. Our study implements several open source FL frameworks and aims to highlight some of the challenges and opportunities when applying these collaborative methods in multi-omics studies.

5.
JMIR AI ; 2(1)2023.
Artigo em Inglês | MEDLINE | ID: mdl-38090533

RESUMO

Background: Deep learning models have shown great success in automating tasks in sleep medicine by learning from carefully annotated electroencephalogram (EEG) data. However, effectively using a large amount of raw EEG data remains a challenge. Objective: In this study, we aim to learn robust vector representations from massive unlabeled EEG signals, such that the learned vectorized features (1) are expressive enough to replace the raw signals in the sleep staging task, and (2) provide better predictive performance than supervised models in scenarios involving fewer labels and noisy samples. Methods: We propose a self-supervised model, Contrast with the World Representation (ContraWR), for EEG signal representation learning. Unlike previous models that use a set of negative samples, our model uses global statistics (ie, the average representation) from the data set to distinguish signals associated with different sleep stages. The ContraWR model is evaluated on 3 real-world EEG data sets that include both settings: at-home and in-laboratory EEG recording. Results: ContraWR outperforms 4 recently reported self-supervised learning methods on the sleep staging task across 3 large EEG data sets. ContraWR also supersedes supervised learning when fewer training labels are available (eg, 4% accuracy improvement when less than 2% of data are labeled on the Sleep EDF data set). Moreover, the model provides informative, representative feature structures in 2D projection. Conclusions: We show that ContraWR is robust to noise and can provide high-quality EEG representations for downstream prediction tasks. The proposed model can be generalized to other unsupervised physiological signal learning tasks. Future directions include exploring task-specific data augmentations and combining self-supervised methods with supervised methods, building upon the initial success of self-supervised learning reported in this study.

7.
J Am Med Inform Assoc ; 31(1): 198-208, 2023 12 22.
Artigo em Inglês | MEDLINE | ID: mdl-37934728

RESUMO

OBJECTIVES: Respiratory syncytial virus (RSV) is a significant cause of pediatric hospitalizations. This article aims to utilize multisource data and leverage the tensor methods to uncover distinct RSV geographic clusters and develop an accurate RSV prediction model for future seasons. MATERIALS AND METHODS: This study utilizes 5-year RSV data from sources, including medical claims, CDC surveillance data, and Google search trends. We conduct spatiotemporal tensor analysis and prediction for pediatric RSV in the United States by designing (i) a nonnegative tensor factorization model for pediatric RSV diseases and location clustering; (ii) and a recurrent neural network tensor regression model for county-level trend prediction using the disease and location features. RESULTS: We identify a clustering hierarchy of pediatric diseases: Three common geographic clusters of RSV outbreaks were identified from independent sources, showing an annual RSV trend shifting across different US regions, from the South and Southeast regions to the Central and Northeast regions and then to the West and Northwest regions, while precipitation and temperature were found as correlative factors with the coefficient of determination R2≈0.5, respectively. Our regression model accurately predicted the 2022-2023 RSV season at the county level, achieving R2≈0.3 mean absolute error MAE < 0.4 and a Pearson correlation greater than 0.75, which significantly outperforms the baselines with P-values <.05. CONCLUSION: Our proposed framework provides a thorough analysis of RSV disease in the United States, which enables healthcare providers to better prepare for potential outbreaks, anticipate increased demand for services and supplies, and save more lives with timely interventions.


Assuntos
Infecções por Vírus Respiratório Sincicial , Vírus Sincicial Respiratório Humano , Criança , Humanos , Estados Unidos/epidemiologia , Lactente , Infecções por Vírus Respiratório Sincicial/epidemiologia , Estações do Ano , Hospitalização , Surtos de Doenças
8.
ArXiv ; 2023 Jul 28.
Artigo em Inglês | MEDLINE | ID: mdl-37576126

RESUMO

Clinical trials are vital in advancing drug development and evidence-based medicine, but their success is often hindered by challenges in patient recruitment. In this work, we investigate the potential of large language models (LLMs) to assist individual patients and referral physicians in identifying suitable clinical trials from an extensive selection. Specifically, we introduce TrialGPT, a novel architecture employing LLMs to predict criterion-level eligibility with detailed explanations, which are then aggregated for ranking and excluding candidate clinical trials based on free-text patient notes. We evaluate TrialGPT on three publicly available cohorts of 184 patients and 18,238 annotated clinical trials. The experimental results demonstrate several key findings: First, TrialGPT achieves high criterion-level prediction accuracy with faithful explanations. Second, the aggregated trial-level TrialGPT scores are highly correlated with expert eligibility annotations. Third, these scores prove effective in ranking clinical trials and exclude ineligible candidates. Our error analysis suggests that current LLMs still make some mistakes due to limited medical knowledge and domain-specific context understanding. Nonetheless, we believe the explanatory capabilities of LLMs are highly valuable. Future research is warranted on how such AI assistants can be integrated into the routine trial matching workflow in real-world settings to improve its efficiency.

9.
Nature ; 620(7972): 47-60, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37532811

RESUMO

Artificial intelligence (AI) is being increasingly integrated into scientific discovery to augment and accelerate research, helping scientists to generate hypotheses, design experiments, collect and interpret large datasets, and gain insights that might not have been possible using traditional scientific methods alone. Here we examine breakthroughs over the past decade that include self-supervised learning, which allows models to be trained on vast amounts of unlabelled data, and geometric deep learning, which leverages knowledge about the structure of scientific data to enhance model accuracy and efficiency. Generative AI methods can create designs, such as small-molecule drugs and proteins, by analysing diverse data modalities, including images and sequences. We discuss how these methods can help scientists throughout the scientific process and the central issues that remain despite such advances. Both developers and users of AI toolsneed a better understanding of when such approaches need improvement, and challenges posed by poor data quality and stewardship remain. These issues cut across scientific disciplines and require developing foundational algorithmic approaches that can contribute to scientific understanding or acquire it autonomously, making them critical areas of focus for AI innovation.


Assuntos
Inteligência Artificial , Projetos de Pesquisa , Inteligência Artificial/normas , Inteligência Artificial/tendências , Conjuntos de Dados como Assunto , Aprendizado Profundo , Projetos de Pesquisa/normas , Projetos de Pesquisa/tendências , Aprendizado de Máquina não Supervisionado
11.
Nat Commun ; 14(1): 5305, 2023 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-37652934

RESUMO

Synthetic electronic health records (EHRs) that are both realistic and privacy-preserving offer alternatives to real EHRs for machine learning (ML) and statistical analysis. However, generating high-fidelity EHR data in its original, high-dimensional form poses challenges for existing methods. We propose Hierarchical Autoregressive Language mOdel (HALO) for generating longitudinal, high-dimensional EHR, which preserve the statistical properties of real EHRs and can train accurate ML models without privacy concerns. HALO generates a probability density function over medical codes, clinical visits, and patient records, allowing for generating realistic EHR data without requiring variable selection or aggregation. Extensive experiments demonstrated that HALO can generate high-fidelity data with high-dimensional disease code probabilities closely mirroring (above 0.9 R2 correlation) real EHR data. HALO also enhances the accuracy of predictive modeling and enables downstream ML models to attain similar accuracy as models trained on genuine data.


Assuntos
Registros Eletrônicos de Saúde , Idioma , Humanos , Funções Verossimilhança , Aprendizado de Máquina , Privacidade
12.
Lancet Digit Health ; 5(8): e495-e502, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37295971

RESUMO

BACKGROUND: Epileptiform activity is associated with worse patient outcomes, including increased risk of disability and death. However, the effect of epileptiform activity on neurological outcome is confounded by the feedback between treatment with antiseizure medications and epileptiform activity burden. We aimed to quantify the heterogeneous effects of epileptiform activity with an interpretability-centred approach. METHODS: We did a retrospective, cross-sectional study of patients in the intensive care unit who were admitted to Massachusetts General Hospital (Boston, MA, USA). Participants were aged 18 years or older and had electrographic epileptiform activity identified by a clinical neurophysiologist or epileptologist. The outcome was the dichotomised modified Rankin Scale (mRS) at discharge and the exposure was epileptiform activity burden defined as mean or maximum proportion of time spent with epileptiform activity in 6 h windows in the first 24 h of electroencephalography. We estimated the change in discharge mRS if everyone in the dataset had experienced a specific epileptiform activity burden and were untreated. We combined pharmacological modelling with an interpretable matching method to account for confounding and epileptiform activity-antiseizure medication feedback. The quality of the matched groups was validated by the neurologists. FINDINGS: Between Dec 1, 2011, and Oct 14, 2017, 1514 patients were admitted to Massachusetts General Hospital intensive care unit, 995 (66%) of whom were included in the analysis. Compared with patients with a maximum epileptiform activity of 0 to less than 25%, patients with a maximum epileptiform activity burden of 75% or more when untreated had a mean 22·27% (SD 0·92) increased chance of a poor outcome (severe disability or death). Moderate but long-lasting epileptiform activity (mean epileptiform activity burden 2% to <10%) increased the risk of a poor outcome by mean 13·52% (SD 1·93). The effect sizes were heterogeneous depending on preadmission profile-eg, patients with hypoxic-ischaemic encephalopathy or acquired brain injury were more adversely affected compared with patients without these conditions. INTERPRETATION: Our results suggest that interventions should put a higher priority on patients with an average epileptiform activity burden 10% or greater, and treatment should be more conservative when maximum epileptiform activity burden is low. Treatment should also be tailored to individual preadmission profiles because the potential for epileptiform activity to cause harm depends on age, medical history, and reason for admission. FUNDING: National Institutes of Health and National Science Foundation.


Assuntos
Estado Terminal , Alta do Paciente , Estados Unidos , Humanos , Estudos Retrospectivos , Estudos Transversais , Resultado do Tratamento
13.
Nat Commun ; 14(1): 3093, 2023 05 29.
Artigo em Inglês | MEDLINE | ID: mdl-37248229

RESUMO

In this work, we aim to accurately predict the number of hospitalizations during the COVID-19 pandemic by developing a spatiotemporal prediction model. We propose HOIST, an Ising dynamics-based deep learning model for spatiotemporal COVID-19 hospitalization prediction. By drawing the analogy between locations and lattice sites in statistical mechanics, we use the Ising dynamics to guide the model to extract and utilize spatial relationships across locations and model the complex influence of granular information from real-world clinical evidence. By leveraging rich linked databases, including insurance claims, census information, and hospital resource usage data across the U.S., we evaluate the HOIST model on the large-scale spatiotemporal COVID-19 hospitalization prediction task for 2299 counties in the U.S. In the 4-week hospitalization prediction task, HOIST achieves 368.7 mean absolute error, 0.6 [Formula: see text] and 0.89 concordance correlation coefficient score on average. Our detailed number needed to treat (NNT) and cost analysis suggest that future COVID-19 vaccination efforts may be most impactful in rural areas. This model may serve as a resource for future county and state-level vaccination efforts.


Assuntos
COVID-19 , Humanos , COVID-19/epidemiologia , Pandemias , Vacinas contra COVID-19 , Bases de Dados Factuais , Hospitalização
14.
Neurology ; 100(17): e1750-e1762, 2023 04 25.
Artigo em Inglês | MEDLINE | ID: mdl-36878708

RESUMO

BACKGROUND AND OBJECTIVES: Seizures (SZs) and other SZ-like patterns of brain activity can harm the brain and contribute to in-hospital death, particularly when prolonged. However, experts qualified to interpret EEG data are scarce. Prior attempts to automate this task have been limited by small or inadequately labeled samples and have not convincingly demonstrated generalizable expert-level performance. There exists a critical unmet need for an automated method to classify SZs and other SZ-like events with expert-level reliability. This study was conducted to develop and validate a computer algorithm that matches the reliability and accuracy of experts in identifying SZs and SZ-like events, known as "ictal-interictal-injury continuum" (IIIC) patterns on EEG, including SZs, lateralized and generalized periodic discharges (LPD, GPD), and lateralized and generalized rhythmic delta activity (LRDA, GRDA), and in differentiating these patterns from non-IIIC patterns. METHODS: We used 6,095 scalp EEGs from 2,711 patients with and without IIIC events to train a deep neural network, SPaRCNet, to perform IIIC event classification. Independent training and test data sets were generated from 50,697 EEG segments, independently annotated by 20 fellowship-trained neurophysiologists. We assessed whether SPaRCNet performs at or above the sensitivity, specificity, precision, and calibration of fellowship-trained neurophysiologists for identifying IIIC events. Statistical performance was assessed by the calibration index and by the percentage of experts whose operating points were below the model's receiver operating characteristic curves (ROCs) and precision recall curves (PRCs) for the 6 pattern classes. RESULTS: SPaRCNet matches or exceeds most experts in classifying IIIC events based on both calibration and discrimination metrics. For SZ, LPD, GPD, LRDA, GRDA, and "other" classes, SPaRCNet exceeds the following percentages of 20 experts-ROC: 45%, 20%, 50%, 75%, 55%, and 40%; PRC: 50%, 35%, 50%, 90%, 70%, and 45%; and calibration: 95%, 100%, 95%, 100%, 100%, and 80%, respectively. DISCUSSION: SPaRCNet is the first algorithm to match expert performance in detecting SZs and other SZ-like events in a representative sample of EEGs. With further development, SPaRCNet may thus be a valuable tool for an expedited review of EEGs. CLASSIFICATION OF EVIDENCE: This study provides Class II evidence that among patients with epilepsy or critical illness undergoing EEG monitoring, SPaRCNet can differentiate (IIIC) patterns from non-IIIC events and expert neurophysiologists.


Assuntos
Epilepsia , Convulsões , Humanos , Reprodutibilidade dos Testes , Mortalidade Hospitalar , Eletroencefalografia/métodos , Epilepsia/diagnóstico
15.
Kidney Int Rep ; 8(3): 489-498, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36938078

RESUMO

Introduction: Rehospitalization after kidney transplant is costly to patients and health care systems and is associated with poor outcomes. Few prediction model studies have examined whether inclusion of clinical notes data from the electronic medical record (EMR) enhances prediction of rehospitalization. Methods: In a retrospective, observational study of first-time, adult kidney transplant recipients at a large, urban hospital in southeastern United States (2005-2015), we examined 30-day rehospitalization (30DR) using structured EMR and unstructured (i.e., clinical notes) data. We used natural language processing (NLP) methods on 8 types of clinical notes and included terms in predictive models using unsupervised machine learning approaches. Both the area under the receiver operating curve and precision-recall curve (ROC and PRC, respectively) were used to determine and compare model accuracy, and 5-fold cross-validation tested model performance. Results: Among 2060 kidney transplant recipients, 30.7% were readmitted within 30 days. Predictive models using clinical notes did not meaningfully improve performance over previous models using structured data alone (ROC 0.6821; 95% confidence interval [CI]: 0.6644, 0.6998). Predictive models built using solely clinical notes performed worse than models using both clinical notes and structured data. The data that contributed to the top performing models were not identical but both included structured data and progress notes (ROC 0.6902; 95% CI: 0.6699, 0.7105). Conclusions: Including new features from clinical notes in risk prediction models did not substantially increase predictive accuracy for 30DR for kidney transplant recipients. Future research should consider pooling data from multiple institutions to increase sample size and avoid overfitting models.

16.
Res Sq ; 2023 Mar 10.
Artigo em Inglês | MEDLINE | ID: mdl-36945542

RESUMO

Synthetic electronic health records (EHRs) that are both realistic and preserve privacy can serve as an alternative to real EHRs for machine learning (ML) modeling and statistical analysis. However, generating high-fidelity and granular electronic health record (EHR) data in its original, highly-dimensional form poses challenges for existing methods due to the complexities inherent in high-dimensional data. In this paper, we propose Hierarchical Autoregressive Language mOdel (HALO) for generating longitudinal high-dimensional EHR, which preserve the statistical properties of real EHR and can be used to train accurate ML models without privacy concerns. Our HALO method, designed as a hierarchical autoregressive model, generates a probability density function of medical codes, clinical visits, and patient records, allowing for the generation of realistic EHR data in its original, unaggregated form without the need for variable selection or aggregation. Additionally, our model also produces high-quality continuous variables in a longitudinal and probabilistic manner. We conducted extensive experiments and demonstrate that HALO can generate high-fidelity EHR data with high-dimensional disease code probabilities ( d ≈ 10,000), disease code co-occurrence probabilities within a visit ( d ≈ 1,000,000), and conditional probabilities across consecutive visits ( d ≈ 5,000,000) and achieve above 0.9 R 2 correlation in comparison to real EHR data. In comparison to the leading baseline, HALO improves predictive modeling by over 17% in its predictive accuracy and perplexity on a hold-off test set of real EHR data. This performance then enables downstream ML models trained on its synthetic data to achieve comparable accuracy to models trained on real data (0.938 area under the ROC curve with HALO data vs. 0.943 with real data). Finally, using a combination of real and synthetic data enhances the accuracy of ML models beyond that achieved by using only real EHR data.

17.
Neurology ; 100(17): e1737-e1749, 2023 04 25.
Artigo em Inglês | MEDLINE | ID: mdl-36460472

RESUMO

BACKGROUND AND OBJECTIVES: The validity of brain monitoring using electroencephalography (EEG), particularly to guide care in patients with acute or critical illness, requires that experts can reliably identify seizures and other potentially harmful rhythmic and periodic brain activity, collectively referred to as "ictal-interictal-injury continuum" (IIIC). Previous interrater reliability (IRR) studies are limited by small samples and selection bias. This study was conducted to assess the reliability of experts in identifying IIIC. METHODS: This prospective analysis included 30 experts with subspecialty clinical neurophysiology training from 18 institutions. Experts independently scored varying numbers of ten-second EEG segments as "seizure (SZ)," "lateralized periodic discharges (LPDs)," "generalized periodic discharges (GPDs)," "lateralized rhythmic delta activity (LRDA)," "generalized rhythmic delta activity (GRDA)," or "other." EEGs were performed for clinical indications at Massachusetts General Hospital between 2006 and 2020. Primary outcome measures were pairwise IRR (average percent agreement [PA] between pairs of experts) and majority IRR (average PA with group consensus) for each class and beyond chance agreement (κ). Secondary outcomes were calibration of expert scoring to group consensus, and latent trait analysis to investigate contributions of bias and noise to scoring variability. RESULTS: Among 2,711 EEGs, 49% were from women, and the median (IQR) age was 55 (41) years. In total, experts scored 50,697 EEG segments; the median [range] number scored by each expert was 6,287.5 [1,002, 45,267]. Overall pairwise IRR was moderate (PA 52%, κ 42%), and majority IRR was substantial (PA 65%, κ 61%). Noise-bias analysis demonstrated that a single underlying receiver operating curve can account for most variation in experts' false-positive vs true-positive characteristics (median [range] of variance explained ([Formula: see text]): 95 [93, 98]%) and for most variation in experts' precision vs sensitivity characteristics ([Formula: see text]: 75 [59, 89]%). Thus, variation between experts is mostly attributable not to differences in expertise but rather to variation in decision thresholds. DISCUSSION: Our results provide precise estimates of expert reliability from a large and diverse sample and a parsimonious theory to explain the origin of disagreements between experts. The results also establish a standard for how well an automated IIIC classifier must perform to match experts. CLASSIFICATION OF EVIDENCE: This study provides Class II evidence that an independent expert review reliably identifies ictal-interictal injury continuum patterns on EEG compared with expert consensus.


Assuntos
Eletroencefalografia , Convulsões , Humanos , Feminino , Pessoa de Meia-Idade , Reprodutibilidade dos Testes , Eletroencefalografia/métodos , Encéfalo , Estado Terminal
19.
iScience ; 25(9): 104970, 2022 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-35992304

RESUMO

The COVID-19 pandemic has caused devastating economic and social disruption. This has led to a nationwide call for models to predict hospitalization and severe illness in patients with COVID-19 to inform the distribution of limited healthcare resources. To address this challenge, we propose a machine learning model, MedML, to conduct the hospitalization and severity prediction for the pediatric population using electronic health records. MedML extracts the most predictive features based on medical knowledge and propensity scores from over 6 million medical concepts and incorporates the inter-feature relationships in medical knowledge graphs via graph neural networks. We evaluate MedML on the National Cohort Collaborative (N3C) dataset. MedML achieves up to a 7% higher AUROC and 14% higher AUPRC compared to the best baseline machine learning models. MedML is a new machine learnig framework to incorporate clinical domain knowledge and is more predictive and explainable than current data-driven methods.

20.
Patterns (N Y) ; 3(4): 100445, 2022 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-35465223

RESUMO

Clinical trials are crucial for drug development but often face uncertain outcomes due to safety, efficacy, or patient-recruitment problems. We propose the Hierarchical Interaction Network (HINT) to predict clinical trial outcomes. First, HINT encodes multi-modal data (drug molecule, target disease, trial eligibility criteria) into embeddings. Then, HINT trains knowledge-embedding modules using drug pharmacokinetic and historical trial data. Finally, a hierarchical interaction graph connects all of the embeddings to capture their interactions and predict trial outcomes. HINT was trained and validated on 1,160 phase I trials, 4,449 phase II trials, and 3,436 phase III trials. It obtained 0.665, 0.620, and 0.847 F1 scores on separate test sets of 627 phase I, 1,653 phase II, and 1,140 phase III trials, respectively. HINT significantly outperforms the best baseline method on most metrics. The benchmark dataset and codes are released at https://github.com/futianfan/clinical-trial-outcome-prediction.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA